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efficiency,  and  strong  stability  and  robustness. 


GROUP  SPARSE  OPTIMIZATION  BY  ALTERNATING  DIRECTION  METHOD 


WEI  DENG,  WOTAO  YIN,  AND  YIN  ZHANG* 

Abstract.  This  paper  proposes  efficient  algorithms  for  group  sparse  optimization  with  mixed  ^2,1  -regularization,  which 
arises  from  the  reconstruction  of  group  sparse  signals  in  compressive  sensing,  and  the  group  Lasso  problem  in  statistics  and  ma¬ 
chine  learning.  It  is  known  that  encoding  the  group  information  in  addition  to  sparsity  will  lead  to  better  signal  recovery /feature 
selection.  The  ^2,1  -regularization  promotes  group  sparsity,  but  the  resulting  problem,  due  to  the  mixed-norm  structure  and 
possible  grouping  irregularity,  is  considered  more  difficult  to  solve  than  the  conventional  l\ -regularized  problem.  Our  approach 
is  based  on  a  variable  splitting  strategy  and  the  classic  alternating  direction  method  (ADM).  Two  algorithms  are  presented, 
one  derived  from  the  primal  and  the  other  from  the  dual  of  the  £2,1  -regularized  problem.  The  convergence  of  the  proposed 
algorithms  is  guaranteed  by  the  existing  ADM  theory.  General  group  configurations  such  as  overlapping  groups  and  incomplete 
covers  can  be  easily  handled  by  our  approach.  Computational  results  show  that  on  random  problems  the  proposed  ADM 
algorithms  exhibit  good  efficiency,  and  strong  stability  and  robustness. 


1.  Introduction.  In  the  last  few  years,  finding  sparse  solutions  to  underdetermined  linear  systems 
has  become  an  active  research  topic,  particularly  in  the  area  of  compressive  sensing,  statistics  and  machine 
learning.  Sparsity  allows  us  to  reconstruct  high  dimensional  data  with  only  a  small  number  of  samples. 
In  order  to  further  enhance  the  recoverability,  recent  studies  propose  to  go  beyond  sparsity  and  take  into 
account  additional  information  about  the  underlying  structure  of  the  solutions.  In  practice,  a  wide  class  of 
solutions  are  known  to  have  certain  “group  sparsity”  structure.  Namely,  the  solution  has  a  natural  grouping 
of  its  components,  and  the  components  within  a  group  are  likely  to  be  either  all  zeros  or  all  nonzeros. 
Encoding  the  group  sparsity  structure  can  reduce  the  degrees  of  freedom  in  the  solution,  thereby  leading  to 
better  recovery  performance. 

This  paper  focuses  on  the  reconstruction  of  group  sparse  solutions  from  underdetermined  linear  mea¬ 
surements,  which  is  closely  related  with  the  Group  Lasso  problem  [I]  in  statistics  and  machine  learning. 
It  leads  to  various  applications  such  as  multiple  kernel  learning  j. 2] ,  microarray  data  analysis  [3],  channel 
estimation  in  doubly  dispersive  multicarrier  systems  [5],  etc.  The  group  sparse  reconstruction  problem  has 
been  well  studied  recently.  A  favorable  approach  in  the  literature  is  to  use  the  mixed  £2,i-regrdarization. 
Suppose  x  £  Mn  is  an  unknown  group  sparse  solution.  Let  {xgi  £  :  i  =  l,...,s}  be  the  grouping  of  x, 

where  C  {1, 2, . . . ,  n}  is  an  index  set  corresponding  to  the  i-th  group,  and  xgi  denotes  the  subvector  of  x 
indexed  by  gt.  Generally,  gi’s  can  be  any  index  sets,  and  they  are  predefined  based  on  prior  knowledge.  The 
^2,1-norm  is  defined  as  follows: 

S 

N|2,i:=£|M|2.  (1.1) 

i- 1 

Just  like  the  use  of  £i-regularization  for  sparse  recovery,  the  £2,i  -regularization  is  known  to  facilitate  group 
sparsity  and  result  in  a  convex  problem.  However,  the  f?2,i-regularized  problem  is  generally  considered 
difficult  to  solve  due  to  the  non-smoothness  and  the  mixed-norm  structure.  Although  the  1'2,1-problem 
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can  be  formulated  as  a  second-order  cone  programming  (SOCP)  problem  or  a  semidefinite  programming 
(SDP)  problem,  solving  either  SOCP  or  SDP  by  standard  algorithms  is  computationally  expensive.  Several 
efficient  first-order  algorithms  have  been  proposed,  e.g.,  a  spectral  projected  gradient  method  (SPGL1)  [5], 
an  accelerated  gradient  method  (SLEP)  [6],  block-coordinate  descent  algorithms  [7]  and  SpaRSA  [8]. 

This  paper  proposes  a  new  approach  for  solving  the  t^p-pi'obleni  based  on  a  variable  splitting  technique 
and  the  alternating  direction  method  (ADM).  Recently,  ADM  has  been  successfully  applied  to  a  variety 
of  convex  or  nonconvex  optimization  problems,  including  ^-regularized  problems  [9],  total  variation  (TV) 
regularized  problems  mm,  matrix  factorization,  completion  and  separation  problems  m  da  m  m- 
Witnessing  the  versatility  of  ADM  approach,  we  utilized  it  to  tackle  the  ^2,1-regularized  problem.  We  applied 
the  ADM  approach  to  both  the  primal  and  dual  forms  of  the  £2,1 -problem  and  obtained  closed  form  solutions 
to  all  the  resulting  subproblems.  Therefore,  the  derived  algorithms  have  convergence  guarantee  according 
to  the  existing  ADM  theory.  Preliminary  numerical  results  demonstrate  that  our  proposed  algorithms  are 
fast,  stable  and  robust,  outperforming  the  previously  known  state-of-the-art  algorithms. 


1.1.  Notation  and  Problem  Formulation.  Throughout  the  paper,  we  let  matrices  be  denoted  by 
uppercase  letters  and  vectors  by  lowercase  letters.  For  a  matrix  X,  we  use  x1  and  Xj  to  represent  its  i-th 
row  and  y-tli  column  respectively. 

To  be  more  general,  instead  of  using  the  f^.i-norm  B  we  consider  the  weighted  £2,1-  (or  iw,2,i-)  norm 
defined  by 

S 

IMk2,i  :=  (L2) 

i=l 

where  Wi  >  0  (i  =  l,...,s)  are  weights  associated  with  each  group.  Based  on  prior  knowledge,  properly 
chosen  weights  may  result  in  better  recovery  performance.  For  simplicity,  we  will  assume  the  groups  {xgi  : 
i  =  1, ...,  s}  form  a  partition  of  x  unless  otherwise  specified.  In  Section [3j  we  will  show  that  our  approach  can 
be  easily  extended  to  general  group  configurations  allowing  overlapping  and/or  incomplete  cover.  Moreover, 
adding  weights  inside  groups  is  also  discussed  in  Section  [3] 

We  consider  the  following  basis  pursuit  (BP)  model: 

min  11^11^,2,1  (1-3) 

X 

s.t.  Ax  =  b, 

where  A  £  lmx"  (m  <  n)  and  b  £  Mm.  Without  loss  of  generality,  we  assume  A  has  full  rank.  When 
the  measurement  vector  b  contains  noise,  the  basis  pursuit  denoising  (BPDN)  models  are  commonly  used, 
including  the  constrained  form: 


min  ||a;|U,2,i  (1-4) 

X 

S.t.  ||  Ax  —  &II2  <  CT, 
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and  the  unconstrained  form: 


min 

X 


IML,2,1  +  ^\\^x 


b\\l 


(1.5) 


where  a  >  0  and  /i  >  0  are  parameters.  In  this  paper,  we  will  stay  focused  on  the  basis  pursuit  model  (1.3 1. 


The  derivation  of  the  ADM  algorithms  for  the  basis  pursuit  denoising  models  ( 1.4 )  and  ( 1.5 )  follows  similarly. 


Moreover,  we  emphasize  that  the  basis  pursuit  model  (1.3 1  is  also  good  for  noisy  data  if  the  iterations  are 
stopped  properly  prior  to  convergence  based  on  the  noise  level. 


1.2.  Outline  of  the  Paper.  The  paper  is  organized  as  follows.  Section  2  presents  two  ADM  algorithms, 
one  derived  from  the  primal  and  the  other  from  the  dual  of  the  fiu,2,i“Problem,  and  states  the  convergence 
results  following  from  the  literature.  For  simplicity,  Section  2  assumes  the  grouping  is  a  partition  of  the 
solution.  In  Section  3,  we  generalize  the  group  configurations  to  overlapping  groups  and  incomplete  cover, 
and  discuss  adding  weights  inside  groups.  Section  4  presents  the  ADM  schemes  for  the  jointly  sparse  recovery 
problem,  also  known  as  the  multiple  measurement  vector  (MMV)  problem,  as  a  special  case  of  the  group 
sparse  recovery  problem.  In  Section  5,  we  report  numerical  results  on  random  problems  and  demonstrate 
the  efficiency  of  the  ADM  algorithms  in  comparison  with  the  state-of-the-art  algorithm  SPGL1. 


2.  ADM-based  First-Order  Primal-Dual  Algorithms.  In  this  section,  we  apply  the  classic  alter¬ 


nating  direction  method  (see,  e.g.,  min])  to  both  the  primal  and  dual  forms  of  the  iw  2ii-problem  (|1.3|). 
The  derived  algorithms  are  efficient  first-order  algorithms  and  are  of  primal-dual  nature  because  both  primal 
and  dual  variables  are  updated  at  each  iteration.  The  convergence  of  the  algorithms  is  established  by  the 
existing  ADM  theory. 


2.1.  Applying  ADM  to  the  Primal  Problem.  In  order  to  apply  ADM  to  the  primal  f^p-problem 


(1.3),  we  first  introduce  an  auxiliary  variable  and  transform  it  into  an  equivalent  problem: 


min 

X,Z 

S.t. 


s 

IMU.2,1  =  ^2wl\\Zg1\\2 

i=  1 


z  =  x,  Ax  =  b. 


(2.1) 


Note  that  problem  ©  has  two  blocks  of  variables  ( x  and  z)  and  its  objective  function  is  separable  in  the 
form  of  f(x)  +  g{z)  since  it  only  involves  z ,  thus  ADM  is  applicable.  The  augmented  Lagrangian  problem  is 
of  the  form 


min  ||z|L  2,1  -  X((z  -  x)  +  —\\z  -  x 

X,Z  2 


131 >Z(Ax-b)  +  ^\\Ax-b\\l, 


(2.2) 


where  Ai  £  R”,A2  £  are  multipliers  and  (3i,/32>0  are  penalty  parameters. 


Then  we  apply  the  ADM  approach,  i.e.,  to  minimize  the  augmented  Lagrangian  problem  (2.2)  with 
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respect  to  x  and  z  alternately.  The  x-subproblem,  namely  minimizing  (2.2)  with  respect  to  x,  is  given  by 


min  Afx  +  y||z-  x\\%  -  A ?Ax+  y  \\Ax  -  6|| \ 

<^-min  \xT (fill  +  /32AtA)x  —  (/3\z  —  Ai  +  /32ATb  +  ATX2)Tx.  (2.3) 

x  2 


Note  that  it  is  a  convex  quadratic  problem,  hence  it  reduces  to  solving  the  following  linear  system: 


(fill  +  p2ATA)x  =  frz  -  Ai  +  /32ATb  +  At A2. 


(2.4) 


Minimizing  (2.2)  with  respect  to  2  gives  the  following  2-subproblem: 


min  II^Hu;  2,1  ~A\Z+^A\\z-  x\\\. 

Z  Z 


Simple  manipulation  shows  that  (2.5)  is  equivalent  to 


E 


^ll%l|2  +  y||%-^-y(Ai)5,||^ 


which  has  a  closed  form  solution  by  the  one-dimensional  shrinkage  (or  soft  thresholding)  formula: 

Wi 


z9i  =  max<J  1 1 1 1 2  -  y,0 


R  2 


for  i  =  1, ...,  s, 


where 


ri  x9i  A  y  (Ai )gi, 


(2.5) 


(2.6) 


(2.7) 


(2.8) 


and  the  convention  0  •  ^  =  0  is  followed.  We  let  the  above  group-wise  shrinkage  operation  be  denoted  by 
2  =  Shrink(x  +  y  Ai,  j^w)  for  short. 


Finally,  the  multipliers  Ai  and  A2  are  updated  in  the  standard  way 

(  Ar  Ai  -  7i/?i(2  -  x), 

1  A2  A2  —  72/^2 (Ax  —  b), 


(2.9) 


where  71,  72  >  0  are  step  lengths. 
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In  short,  we  have  derived  an  ADM  iteration  scheme  for  (2.1)  as  follows: 


Algorithm  1:  Primal-Based  ADM  for  Group  Sparsity 


1 

2 

3 

4 

5 

6 


Initialize  2  G  Rn,  Ai  G  Mn,  A2  G  Rm,  /?i,/32  >0  and  71,72  >  0; 
while  stopping  criterion  is  not  met  do 

X  <-  {f5\I  +  p2ATA)~1{piz  -  X1  +  p2ATb  +  ATA2); 
z  -£-  Shrink{x  +  4j-Ai,  A-w>)  (group- wise) ; 

Ai  £-  Ai  -  7i/3i(z  -  cc); 

A2  <-  A2  -  l~iifii{Ax  -  b); 


Since  the  above  ADM  scheme  computes  the  exact  solution  for  each  subproblem,  its  convergence  is 
guaranteed  by  the  existing  ADM  theory  mm-  We  state  the  convergence  result  by  the  following  theorem. 

Theorem  2.1.  For  /3i,  /32  >  0  and  71,  y2  £  (7  7;+1)  >  seQuence  {{x^k\  z^)}  generated  by  Algo- 
rithm^from  any  initial  point  (x^°\  z^)  converges  to  {x*,z*),  where  ( x*,z *)  is  a  solution  of  (2.1). 


2.2.  Applying  ADM  to  the  Dual  Problem.  Now  we  apply  the  ADM  technique  to  the  dual  form 


of  the  Ao.2.1  -problem  (1.3 1  and  derive  an  equally  simple  yet  more  efficient  algorithm. 


The  dual  of  (1.3)  is  given  by 


max  <  min  7^  Wj\\ x9i  ||2  —  yT {Ax  —  b) 


=  max  <  bTy  +  min^2  {wi\\xgih  ~  VT AgiXgi) 


=  max  {bTy  :  ||A^y||2  <  Wi,  for  i  =  1,  ...,s}  , 
y 


(2.10) 


where  y  £  Km,  and  Ag.  represents  the  submatrix  collecting  columns  of  A  that  corresponds  to  the  i-th  group. 
Similarly,  we  introduce  a  splitting  variable  to  reformulate  it  as  a  two-block  separable  problem: 


min  —bky 

y,z 

s.t.  2  =  ATy, 


(2.11) 


|2  <  Wi,  for  i  =  1, ...,  s, 


whose  associated  augmented  Lagrangian  problem  is 


min  -b1  y-  x1  {z  -  A1  y)  +  -\\z  -  A1  y|| 
y,z  Z 


T„.  ||  2 
25 


(2.12) 


S.t. 


I2  <  for  i  =  1,  ...,s, 


where  /3  >  0  is  a  penalty  parameter,  x  £  Rra  is  a  multiplier  and  essentially  the  primal  variable. 


Then  we  apply  the  alternating  minimization  idea  to  (2.12).  The  y-subproblem  is  a  convex  quadratic 
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problem: 


min  -bTy  +  ( Ax)T y  +  ^|| z-  ATy\\22, 
y  2 


(2.13) 


which  can  be  further  reduced  to  the  following  linear  system: 


/3AATy  =  b  —  Ax  +  j3Az. 


(2-14) 


The  2-subproblem  is  given  by 


min  -x1  z  + -\\z  -  A1  y\\2, 

Z  Z 

S.t.  H-zJh  <  Wi,  for  i  = 


(2.15) 


We  can  equivalently  reformulate  it  as 


s.t. 


Zfll**  ~A^y- 


i= 1 


|2  <  Wi,  for  i  =  1, ...,  s. 


(2.16) 


It’s  easy  to  show  that  the  solution  to  (2.16)  is  given  by 


Zgz  ='PBi!(A^y+  -pXgi),  for  i  =  1 


(2.17) 


Here  V  represents  a  projection  (in  Euclidean  norm)  onto  a  convex  set  denoted  as  a  subscript  and  B2  =  {z  € 
Rni  :  ||z||2  <  Wi}.  In  short, 


2  =  Vb,  {ATy  +  \x), 


(2.18) 


where  B2  =  {z  £  K™  :  ||^SJ|2  <  Wi,  for  i  =  l,...,s}.  Finally  we  update  the  multiplier  (i.e.  the  primal 
variable)  x  by 


x  <—  x  —  7/3(2  —  ATy), 


(2.19) 


where  7  >  0  is  a  step  length. 


6 


Therefore,  we  have  derived  an  ADM  iteration  scheme  for  (2.11 1  as  follows: 


Algorithm  2:  Dual-Based  ADM  for  Group  Sparsity 


1 

2 

3 

4 

5 


Initialize  x  £  Rra,  z  £  Rn,  /3  >  0  and  7  >  0; 
while  stopping  criterion  is  not  met  do 
y  4—  ( f3AAT)~1(b  —  Ax  +  f3Az)\ 
z  4-  VB2(ATy  +  ^x)  (group-wise); 
x  4—  x  —  7 P(z  —  ATy)\ 


Note  that  each  subproblem  is  solved  exactly  in  Algorithm  [2]  Hence  the  convergence  result  follows  from 

[M  [121- 

Theorem  2.2.  For  (3  >  0  and  7  £  ^0,  v^,+1  j ,  the  sequence  {(x^) ,y^k\  z^)}  generated  by  Algorithm  2 
from  any  initial  point  (x^°\  y^°\  z^)  converges  to  (x* ,y* ,  z*),  where  x*  is  a  solution  of  (1.3),  and  ( y*,z *) 
is  a  solution  of  ( 2.11\ ). 


2.3.  Remarks.  In  the  primal-based  ADM  scheme,  the  update  of  x  is  written  in  the  form  of  solving  an 
nxn  linear  system  or  inverting  an  n  x  n  matrix.  In  fact,  it  can  be  reduced  to  solving  a  smaller  mx  m  linear 
system  or  inverting  an  to  x  m  matrix  by  Shernran-Morrison- Woodbury  formula: 

(Pil  +  p2ATA)~1  =  ^-I-  ^LAT{p1I  +  p2AAT)~1A.  (2.20) 

Pi  Pi 

Note  that  in  many  compressive  sensing  applications,  A  is  formed  by  randomly  taking  a  subset  of  rows  from 
orthonormal  transform  matrices,  e.g.,  the  discrete  cosine  transform  (DCT)  matrix,  the  discrete  Fourier  trans¬ 
form  (DFT)  matrix  and  the  discrete  Walsh-Hadamard  transform  (DWHT)  matrix.  Then  A  has  orthonormal 
rows,  i.e.,  AAT  =  I.  In  this  case,  there  is  no  need  to  solve  a  linear  system  for  either  the  primal-based  ADM 
scheme  or  the  dual-based  scheme.  The  main  computational  cost  becomes  two  matrix-vector  multiplications 
per  iteration  for  both  schemes. 

For  general  matrix  A ,  solving  the  to  x  m  linear  system  becomes  the  most  costly  part.  However,  we  only 
need  to  compute  the  matrix  inverse  or  do  the  matrix  factorization  once.  Therefore,  the  computational  cost 
per  iteration  is  still  0(mn).  For  large  problems  when  solving  such  an  to  x  to  linear  system  is  no  longer 
affordable,  we  may  just  take  a  steepest  descent  step  instead.  In  this  case,  the  subproblem  is  solved  inexactly. 
Hence  its  convergence  remains  an  issue  for  further  research.  However,  empirical  evidence  shows  that  the 
algorithms  still  converge  well.  By  taking  a  steepest  descent  step,  our  ADM  algorithms  only  involve  matrix- 
vector  multiplications.  Consequently,  A  can  be  accepted  as  two  linear  operators  A  *  (•)  and  AT  *  (•),  and  the 
storage  of  the  matrix  A  may  not  be  needed. 


3.  Several  Extensions.  So  far,  we  have  presumed  that  the  grouping  {xgi,  ...,xgs}  in  the  problem 
formulation  (1.3)  is  a  non-overlapping  cover  of  x.  It  is  of  practical  importance  to  consider  more  general 
group  configurations  such  as  overlapping  groups  and  incomplete  cover.  Furthermore,  it  can  be  desirable  to 
introduce  weights  inside  each  group  for  better  scaling.  In  this  section,  we  will  demonstrate  that  our  approach 
can  be  easily  extended  to  these  general  settings. 
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3.1.  Overlapping  Groups.  Overlapping  group  structure  commonly  arises  in  many  applications.  For 
instance,  in  microarray  data  analysis,  gene  expression  data  are  known  to  form  overlapping  groups  since  each 
gene  may  participate  in  multiple  functional  groups  |X8] .  The  weighted  ^2,1-regularization  is  still  applicable 


yielding  the  same  formulation  as  in  (1.3).  However,  { xgi ,  ...,xga}  now  may  have  overlaps,  which  makes  the 
problem  more  challenging  to  solve.  As  we  will  show,  our  approach  can  handle  this  difficulty. 

Using  the  same  strategy  as  before,  we  first  introduce  auxiliary  variables  Zi’s  and  let  Zi  =  x9i  {i  =  1, . . . ,  s), 
yielding  the  following  equivalent  problem: 


min 

X,Z 

S.t. 


5>INh 

i—1 


z  =  x,  Ax  =  6, 


(3.1) 


where  £  =  [zf , . . . ,  zJ]T  £  R",  x  =  [xgi, . . .  ,  xJJT  £  R"  and  n  =  J2l=1  rii  >  n.  The  augmented  Lagrangian 
problem  is  of  the  form: 

min  y'wi||^||2  -  \i(z-  x)  +  ^ \\z  -  x\\\  -  A  %  (Ax  -  b)  +  ^-\\Ax  -  b\\ 2,  (3.2) 

X,  Z  z '  Z  Z 

i=  1 


where  Ai  £  Rn,  A2  £  Rm  are  multipliers,  and  /3\,  /32  >  0  are  penalty  parameters. 

Then  we  perform  alternating  minimization  in  x  and  z  directions.  The  benefit  from  our  variable  splitting 
technique  is  that  the  weighted  t?2ji-regularization  term  no  longer  contains  overlapping  groups  of  variables 
xgi’s.  Instead,  it  only  involves  Zi  s,  which  do  not  overlap,  thereby  allowing  us  to  easily  perform  exact 
minimization  for  the  z-subproblem  just  as  the  non-overlapping  case.  The  closed  form  solution  of  the  z- 


subproblem  is  given  by  the  shrinkage  formula  for  each  group  of  variables  zi:  the  same  as  in  (2.7)  and  (2.8). 
We  note  that  the  x-subproblem  is  a  convex  quadratic  problem.  Thus,  the  overlapping  feature  of  x  does  not 
bring  much  difficulty.  Clearly,  x  can  be  represented  by 


x  =  Gx, 


(3.3) 


and  each  row  of  G  £  Rnxn  has  a  single  1  and  0’s  elsewhere.  The  a;-subproblem  is  given  by 

min  AfGx  +  ^-\\z  -  Gx\\\  -  A ^Ax+  ^\\Ax-b\\\,  (3.4) 

x  Z  Z 

which  is  equivalent  to  solving  the  following  linear  system: 

{Pi GtG  +  p2ATA)x  =  PiGtz  -  Gt Ai  +  p2ATb  +  ATA2.  (3.5) 

Note  that  GTG  £  R"x?l  is  a  diagonal  matrix  whose  i- th  diagonal  entry  is  the  number  of  repetitions  of  Xi  in 
x.  When  the  groups  form  an  complete  cover  of  the  solution,  the  diagonal  entries  of  GTG  will  be  positive, 
so  GTG  is  invertible.  In  the  next  subsection,  we  will  show  that  an  incomplete  cover  case  can  be  converted 
to  a  complete  cover  case  by  introducing  an  auxiliary  group.  Therefore,  we  can  generally  assume  GTG  is 
invertible.  Then  Sherman-Morrison- Woodbury  formula  is  applicable,  and  solving  this  n  x  n  linear  system 


can  be  can  further  reduced  to  solving  an  m  x  m  linear  system. 


We  can  also  formulate  the  dual  problem  of  (3.1l  as  follows: 


max  <  min  Wi\\zg.  H2  —  yT {Ax  —  b)  —  pT(z  —  Gx) 

v.n  I  x.z  '  ^  1 


y,p  I  x,z  *- 

\  i—1 


=  max  bT y  +  min^  (wi\\zgi  ||2  —  pf  zi)  +  min  ( -ATy  +  GTp)T  x 


»= 1 

=  max  [bTy  :  GTp  =  ATy ,  ||pj||2  <  wit  for  i  =  1, ...,  s} 


where  y  £  Rm,  p  =  \p{ , . . .  ,pJ]T  €  Rn  and  pt  £  {i  =  1, . . . ,  s). 

We  introduce  an  splitting  variable  q  £  K"  and  obtain  an  equivalent  problem: 


min  —bTy 

y,p,q 

s.t.  GTp  =  ATy , 

V  =  <h 

||<7i||2  <  Wi,  for  i  =  1  ,...,s. 


(3.6) 


(3.7) 


Likewise,  we  minimize  its  augmented  Lagrangian  by  the  alternating  direction  method.  Notice  that  the  {y,p)~ 
subproblem  is  a  convex  quadratic  problem,  and  the  g-subproblem  has  a  closed  form  solution  by  projection 
onto  ^2-norm  balls.  Therefore,  a  similar  dual-based  ADM  algorithm  can  be  derived.  For  the  sake  of  brevity, 
we  omit  the  derivation  here. 


3.2.  Incomplete  Cover.  In  some  applications  such  as  group  sparse  logistic  regression,  the  groups 
may  be  an  incomplete  cover  of  the  solution  because  only  partial  components  are  sparse.  This  case  can 
be  easily  dealt  with  by  introducing  a  new  group  containing  the  uncovered  components,  i.e. ,  letting  g  = 
{1, . . . ,  n}  \  U i=13i-  Then  we  can  include  this  group  g  in  the  f'^.i-regularization  and  associate  it  with  a  zero 
or  tiny  weight. 


3.3.  Weights  Inside  Groups.  Although  we  have  considered  an  weighted  version  of  the  £2,i-norm 
(1.2),  the  weights  are  only  added  between  the  groups.  In  other  words,  components  within  a  group  are 


associated  with  the  same  weight.  In  applications  such  as  multi-modal  sensing/classification,  components  of 
each  group  are  likely  to  have  a  large  dynamic  range.  Introducing  weights  inside  each  group  can  balance  the 
different  scales  of  the  components,  thereby  improving  the  accuracy  and  stability  of  the  reconstruction. 

Thus,  we  consider  the  weighted  ^2-norm  in  place  of  the  f^-norm  in  the  definition  of  -norm  s 
For  x  £  Rn,  the  weighted  t^-norm  is  given  by 


Ikll W,2  ~  \\Wx\\2, 


(3.8) 


where  W  =  diag{[w  1, . . . ,  «;„])  is  a  diagonal  matrix  with  weights  on  its  diagonal  and  wt  >  0  (i  =  1, . . . ,  n). 
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With  weights  inside  each  group,  the  problem  (1.3)  becomes 


min  y'  wiWW^Xgj  ||2  (3.9) 

i—l 

s.t.  Ax  =  b , 

where  £  RniXni  is  a  diagonal  weight  matrix  for  the  i-th  group.  After  a  change  of  variable  by  letting 
Zi  =  W®  x9i  (i  =  1, . . . ,  5),  it  can  be  reformulated  as 


min 

z 

S.t. 


s 

z  —  1 


z  =  WGx,  Ax  =  b, 


where  z  =  [zf , . . . ,  zJ]T  £ 


Gx  = 


„T  1 T 


vg 1 >  ■ ' 


=  Ei=i  n-i  >  n  and 


W  := 


W ^ 


(3.10) 


Then  the  problem  can  be  addressed  within  our  framework. 


4.  Joint  Sparsity.  Now  we  study  an  interesting  special  case  of  the  group  sparsity  structure  called  joint 
sparsity.  Jointly  sparse  solutions,  namely,  a  set  of  sparse  solutions  that  share  a  common  nonzero  support,  arise 
in  cognitive  radio  networks  HU,  distributed  compressive  sensing  |20] ,  direction-of-arrival  estimation  in  radar 
m ,  magnetic  resonance  imaging  with  multiple  coils  [22]  and  many  other  applications.  The  reconstruction 
of  jointly  sparse  solutions,  also  known  as  the  multiple  measurement  vector  (MMV)  problem,  has  its  origin 
in  sensor  array  signal  processing  and  recently  has  received  much  interest  as  an  extension  of  the  single  sparse 
solution  recovery  in  compressive  sensing. 

The  (weighted)  t^p-regularization  has  been  popularly  used  to  encode  the  joint  sparsity,  given  by 

n 

min  ||X|k2,i  :=5>|M|2  (4.1) 

*= 1 

s.t.  AX  =  B , 


where  X  =  [xi, . . .  ,xj]  £  Knxi  denotes  a  collection  of  l  jointly  sparse  solutions,  A  £  Rmxn  (m  <  n),  B  £ 
Rmxi  and  Wi  >  0  for  i  =  1, ...,  n.  Recall  that  xl  and  x3  denote  the  i- th  row  and  j-th  column  of  X  respectively. 


Indeed,  joint  sparsity  can  be  viewed  as  a  special  non-overlapping  group  sparsity  structure  with  each 


group  containing  one  row  of  the  solution  matrix.  Clearly,  the  joint  lw,2A  -norm  given  in  (4.1)  is  consistent 
with  the  definition  of  group  iw.‘2A -norm  (1.3).  Further,  we  can  cast  problem  (4.1)  in  the  form  of  a  group 
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sparsity  problem.  Let  us  define 


1 

1 _ 

1 

1-1  CM 

1 _ 

V 

b2 

A  :=  Ii  0  A  = 

A 

,  x  :=  vec(X)  = 

Xl 

and  b  :=  vec(B)  = 

bi 

where  //  £  R x  is  the  identity  matrix,  vec(-)  and  0  are  standard  notations  for  the  vectorization  of  a  matrix 
and  the  Kronecker  product  respectively.  We  partition  x  into  n  groups  {xgi,  ...,x9n}  where  xgi  £  Rz  (i  = 
1 ,n)  corresponds  to  the  z-th  row  of  the  matrix  X .  Then  it  is  easy  to  see  that  problem  (4.1 )  is  equivalent 
to  the  following  group  ^2,1-problem: 


min  ||x||W)2,i  := 

X  z — ' 

2=1 

s.t.  Ax  =  b. 


(4.3) 


Moreover,  under  the  joint  sparsity  setting,  our  primal-based  ADM  scheme  for  (4.3 1  has  the  following  form: 


'  X  <-  (Jfal  +  faATA)~1{fi1Z  -  A1  +  faATB  +  AT A2), 
Z  ■£-  Shrink(X  +  j^w)  (row-wise), 
fa  <—  Ai  —  7i/3i (Z  —  A), 

,  A2  <r~  A2  —  j2fa{AX  —  B). 


Here  Ai  £  Rnxl,  A2  £  Rmxl  are  multipliers,  /3i,  fa  >  0  are  penalty  parameters,  71,  72  >  0  are  step  lengths 
and  the  updating  of  Z  by  row-wise  shrinkage  represents 

f  in  ■  'j 

2;*  =max|||rI||2- -^,0j  ,  for  i  =  1, ..., n,  (4.5) 

where 


Correspondingly,  the  dual  of  (4.1)  is  given  by 


(4.6) 


max  B  •  Y  (4.7) 

Y 

s.t.  \\AjY\\2<Wi,  for  i  =  1,  ...,n, 


where  denotes  the  sum  of  component-wise  products. 


And  the  dual-based  ADM  scheme  for  the  joint 
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sparsity  problem  is  of  the  following  form: 


f  Y  <-  (pAAT)~\B  -  AX  + /3AZ), 

<  Z  <—  V~b 'r,(ATY  +  jjX)  (row-wise),  (4.8) 

[  X  ^  X  -j/3(Z  -  AtY). 


Here  /3  >  0  and  7  >  0  are  the  penalty  parameter  and  the  step  length  respectively  as  before,  X  is  the  primal 
variable  and  B!j  :=  {Z  £  Rnxl  :  ||zJ||2  <  to*,  for  i  =  1, 

In  addition,  we  can  consider  a  more  generalized  joint  sparsity  scenario  where  each  column  of  the  solution 
matrix  corresponds  to  a  different  measurement  matrix.  Specifically,  we  consider  the  following  problem: 


nun  ||X||u,i2>i  := 
s.t.  AjXj  =  bj, 


i= 1 


3  =  1,  ■  •  • ,  1, 


(4.9) 


where  X  £  Aj  £  KmJxrl  ( rrij  <  n),  bj  £  WmjXl  and  Wi  >  0  for  *  =  1, 

it  in  the  form  of  (4.3),  just  replacing  A  in  (4.2)  by 


...,  n.  Likewise,  we  can  reformulate 


A 1 


A  := 


A  2 


Ai 


(4.10) 


and  deal  with  it  as  a  group  sparsity  problem. 


5.  Numerical  Experiments.  In  this  section,  we  present  numerical  results  to  evaluate  the  performance 
of  our  proposed  ADM  algorithms  in  comparison  with  the  state-of-the-art  algorithm  SPGL1  (version  1.7)  [5]. 
We  tested  them  on  two  sets  of  synthetic  data  with  group  sparse  solutions  and  jointly  sparse  solutions,  respec¬ 
tively.  Both  speed  and  solution  quality  are  compared.  The  numerical  experiments  were  run  in  MATLAB 
7.10.0  on  a  Dell  desktop  with  an  Intel  Core  2  Duo  2.80GHz  CPU  and  2GB  of  memory. 

Several  other  existing  algorithms  such  as  SpaRSA  [5] ,  SLEP  [5j  and  block-coordinate  descent  algorithms 
(BCD)  [7]  have  not  been  included  in  these  experiments  for  the  following  reasons.  Unlike  ADMs  and  SPGL1 


that  directly  solve  the  constrained  models  (1.3)  and  (1.4),  SpaRSA,  SLEP  and  BCD  are  all  designed  to 


solve  the  unconstrained  problem  (1.5).  For  the  unconstrained  problem,  the  choice  of  the  penalty  parameter 


H  is  a  critical  issue,  affecting  both  the  reconstruction  speed  and  accuracy  of  these  algorithms.  In  order  to 
conduct  fair  comparison,  it  is  important  to  make  a  good  choice  of  the  penalty  parameter.  However,  it  is 
usually  difficult  to  choose  and  may  need  heuristic  techniques  such  as  continuation.  In  addition,  we  notice 
that  current  versions  of  both  SLEP  and  BCD  cannot  accept  the  measurement  matrix  A  as  an  operator. 
Therefore,  we  mainly  compare  the  ADM  algorithms  with  SPGL1  in  the  experiments,  which  will  provide 
good  insight  on  the  behavior  of  the  ADM  algorithms.  More  comprehensive  numerical  experiments  will  be 
conducted  in  the  future. 
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5.1.  Group  Sparsity  Experiment.  In  this  experiment,  we  generate  group  sparse  solutions  as  follows: 
we  first  randomly  partition  an  n- vector  x  into  s  groups  and  then  randomly  pick  k  of  them  as  active  groups 
whose  entries  are  iid  random  Gaussian  while  the  remaining  groups  are  all  zeros.  We  use  randomized  partial 
Walsh-Hadamard  transform  matrices  as  measurement  matrices  A  £  Rmxn.  These  transform  matrices  are 
suitable  for  large-scale  computation  and  have  the  property  AAT  =  I.  Fast  matrix-vector  multiplications 
with  partial  Walsh-Hadamard  matrix  A  and  its  transpose  AT  are  implemented  in  C  with  a  MATLAB  mex- 
interface  available  to  all  codes  compared.  We  emphasize  that  on  matrices  other  than  Walsh-Hadamard, 
similar  comparison  results  are  obtained.  The  problem  size  is  set  to  n  =  8192,  m  =  2048  and  s  =  1024.  We 
test  on  both  noiseless  measurement  data  b  =  Ax  and  noisy  measurement  data  with  0.5%  additive  Gaussian 
noise. 

We  set  the  parameters  for  the  primal-based  ADM  algorithm  (PADM)  as  follows:  /3i  =  0.3 / mean(abs(b)) , 
fa  =  3/mean(abs(b ))  and  71  =  72  =  1.618.  Here  we  use  the  MATLAB-type  notation  mean{abs{b))  to 
denote  the  arithmetic  average  of  the  absolute  value  of  b.  For  the  dual-base  ADM  algorithm  (DADM),  we 
set  /?  =  2  *  mean(abs(b ))  and  7  =  1.618.  Recall  that  the  step  length  being  1.618  «  (y/5  +  l)/2  is  the  upper 
bound  for  theoretical  convergence  guarantee.  We  use  the  default  parameter  setting  for  SPGL1  except  setting 
proper  tolerance  values  for  different  tests.  Notice  that  SPGL1  is  designed  to  solve  the  constrained  denoising 


model  (1.4),  we  set  its  input  argument  sigma  ideally  to  the  true  noise  magnitude.  All  the  algorithms  use 
zero  as  the  starting  point.  The  weights  in  the  -norm  are  set  to  one. 


We  present  two  comparison  results.  Figure  5.1  shows  the  decreasing  behavior  of  relative  error  as  each 
algorithm  proceeds  for  1000  iterations.  Here  the  number  of  active  groups  is  fixed  at  k  =  100.  Figure  |5.2| 
illustrates  the  performance  of  each  algorithm  in  terms  of  relative  error,  running  time  and  number  of  iterations 
as  k  varies  from  70  to  110.  The  ADM  algorithms  are  terminated  when  ||cc^fc+1^  —  x^\\  <  tol  ■  ||cc(-fc-)||,  i.e. , 
the  relative  change  of  two  consecutive  iterates  becomes  smaller  than  the  tolerance.  The  tolerance  value  tol 
is  set  to  10~6  for  noiseless  data  and  5  x  10-4  for  noisy  data  with  5%  Gaussian  noise.  SPGL1  has  more 
sophisticated  stopping  criteria.  In  order  to  make  a  fair  comparison,  we  let  SPGL1  reach  roughly  similar 
accuracy  as  PADM  and  DADM.  We  empirically  tuned  the  tolerance  parameters  of  SPGL1,  namely  bpTol, 
decTol  and  optTol ,  for  different  sparsity  levels,  since  we  found  it  difficult  to  use  a  consistent  way  of  setting 
the  tolerance  parameters.  Specifically,  we  chose  a  decreasing  sequence  of  tolerance  values  as  the  sparsity 
level  increases.  Therefore,  all  the  algorithms  achieved  comparable  relative  errors. 


5.2.  Joint  Sparsity  Experiment.  Jointly  sparse  solutions  X  £  Mnxi  are  generated  by  randomly 
selecting  k  rows  to  have  iid  random  Gaussian  entries  and  letting  the  other  rows  to  be  zero.  Randomized 
partial  Walsh-Hadamard  transform  matrices  are  utilized  as  measurement  matrices  A  £  Mmxra.  Here,  we  set 
n  =  1024,  m  =  256  and  l  =  16.  The  parameter  setting  for  each  algorithm  is  the  same  as  described  in  the 
previous  section  [5J] 

Similarly,  we  perform  two  classes  of  numerical  tests.  In  one  test,  we  fix  the  number  of  nonzero  rows 
k  =  115  and  study  the  decreasing  rate  of  relative  error  for  each  algorithm,  as  is  shown  in  Figure  [sTj]  In  the 
other  test,  we  set  proper  stopping  tolerance  values  as  described  in  section  [5.1[  thus  all  the  algorithms  will 
reach  comparable  relative  error.  Then  we  compare  the  CPU  time  and  number  of  iterations  consumed  by 
each  algorithm  for  different  sparsity  levels  from  k  =  100  to  120.  The  result  is  presented  in  Figure  [O] 
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Noiseless  Data 


0.5%  Gaussian  Noise 


Fig.  5.1.  Convergence  rate  results  of  PADM,  DADM  and  SPGL1  on  £2,1  -regularized  group  sparsity  problem  (n  =  8192, 
m  =  2048,  s  =  1024  and  k  =  100/  The  x-axes  represent  number  of  iterations,  and  the  y-axes  represent  relative  error.  The  left 
plot  corresponds  to  noiseless  data  and  the  right  plot  corresponds  to  data  with  0.5%  Gaussian  noise.  The  results  are  average  of 
50  runs. 


Fig.  5.2.  Comparison  results  of  PADM,  DADM  and  SPGL1  on  ^2,1  -regularized  group  sparsity  problem  (n  =  8192, 
m  =  2048  and  s  =  1024/  The  x-axes  represent  number  of  nonzero  groups,  and  the  y-axes  represent  relative  error,  CPU  time 
and  number  of  iterations  (from  left  to  right).  The  top  row  corresponds  to  noiseless  data  and  the  bottom  row  corresponds  to 
data  with  0.5%  Gaussian  noise.  The  results  are  average  of  50  runs. 
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Noiseless  Data  0.5%  Gaussian  Noise 


Fig.  5.3.  Convergence  rate  results  of  PADM,  DADM  and  SPGL1  on  £2,1  -regularized  joint  sparsity  problem  (n  =  1024, 
m  =  256,  l  =  16  and  k  =  115,).  The  x-axes  represent  number  of  iterations  and  the  y-axes  represent  relative  error.  The  left 
plot  corresponds  to  noiseless  data  and  the  right  plot  corresponds  to  data  with  0.5%  Gaussian  noise.  The  results  are  average  of 
50  runs. 


Noiseless  Data  Noiseless  Data  Noiseless  Data 


Sparsity  Level  k  Sparsity  Level  k  Sparsity  Level  k 


Fig.  5.4.  Comparison  results  of  PADM,  DADM  and  SPGL1  on  l2,i -regularized  joint  sparsity  problem  (n  =  1024,  m  =  256 
and  l  =  16).  The  x-axes  represent  number  of  nonzero  rows,  and  the  y-axes  represent  relative  error,  CPU  time  and  number  of 
iterations  (from  left  to  right).  The  top  row  corresponds  to  noiseless  data  and  the  bottom  row  corresponds  to  data  with  0.5% 
Gaussian  noise.  The  results  are  average  of  50  runs. 
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5.3.  Discussions.  As  we  can  see  from  both  Figure[5A]and  Figure [5731  the  relative  error  curves  produced 
by  PADM  and  DADM  almost  coincide  and  fall  quickly  below  the  one  by  SPGL1  in  either  noiseless  or  noisy 
case.  In  other  words,  the  ADM  algorithms  decrease  the  relative  error  much  faster  than  SPGL1.  With  noiseless 
data,  the  ADM  algorithms  reach  machine  precision  10~16  after  200  ~  300  iterations,  whereas  SPGL1  attains 
only  10-5  ~  10-6  accuracy  after  1000  iterations.  Although  SPGL1  will  reach  machine  precision  eventually, 
it  needs  far  more  iterations. 


When  the  data  contains  noise,  high  accuracy  is  generally  not  achievable.  With  0.5%  additive  Gaussian 
noise,  all  the  algorithms  converge  to  the  same  relative  error  level  around  10-2.  However,  we  can  observe  that 
the  ADM  algorithms  and  SPGL1  have  different  solution  paths.  While  SPGL1  decreases  the  relative  error 
almost  monotonically,  the  relative  error  curves  of  the  ADM  algorithms  have  a  “down-tlren-up”  behavior. 
Specifically,  their  relative  error  curves  first  go  down  quickly  and  reach  the  lowest  level  around  5  x  10~3,  but 
then  they  start  to  go  up  a  bit  until  convergence.  This  “down-then-up”  phenomenon  is  because  the  optima  of 
the  ^2,1-problem  with  erroneous  data  may  not  necessarily  yield  the  best  solution  quality.  In  fact,  the  ADM 
algorithms  still  keep  decreasing  the  objective  values  even  though  the  relative  errors  start  to  increase.  In 
other  words,  the  ADM  algorithms  may  give  a  better  solution  if  it  is  stopped  properly  prior  to  convergence. 
We  can  see  that  SPGL1  takes  approximately  200  iterations  to  decrease  the  relative  error  to  1(U2,  while  the 
ADM  algorithms  need  only  30  iterations  to  reach  even  higher  accuracy. 


To  further  assess  the  efficiency  of  the  ADM  algorithms,  we  study  the  comparison  results  on  relative  error, 


CPU  time  and  number  of  iterations  for  different  sparsity  levels.  As  can  be  seen  in  Figure  5.2  and  Figure  5.4 


PADM  and  DADM  have  very  similar  performance  though  DADM  is  often  slightly  faster.  They  both  exhibit 
good  stability,  attaining  the  desired  accuracy  with  roughly  the  same  number  of  iterations  over  different 
sparsity  levels.  Although  SPGL1  can  also  stably  reach  comparable  accuracy,  it  consumes  substantially  more 
iterations  as  the  sparsity  level  increases.  For  noisy  data,  the  ADM  algorithms  obtain  a  bit  higher  accuracy 
than  SPGL1,  which  is  due  to  the  different  solution  paths  as  shown  in  Figure  [571]  and  Figure  [573)  For  the 
ADM  algorithms,  recall  that  the  stopping  tolerance  for  relative  change  is  set  to  5  x  1CU4.  Using  similar 
tolerance  values  that  are  consistent  with  the  noise  level,  the  ADM  algorithms  are  often  terminated  near  the 
point  with  the  lowest  relative  error.  However,  SPGL1  could  hardly  further  lower  its  relative  error  by  using 
different  tolerance  values. 


Moreover,  we  can  observe  that  the  speed  advantage  of  the  ADM  algorithms  over  SPGL1  is  significant. 
Notice  that  the  dominating  computational  load  for  all  the  algorithms  are  matrix-vector  multiplications.  For 
both  PADM  and  DADM,  the  number  of  matrix-vector  multiplications  are  two  per  iteration.  The  number 
used  by  SPGL1  may  vary  in  each  iteration,  usually  more  than  two  per  iteration  on  average.  Compared  to 
SPGL1,  the  ADM  algorithms  not  only  consume  fewer  iterations  to  obtain  the  same  or  even  higher  accuracy, 
but  are  also  less  computationally  expensive  at  each  iteration.  Therefore,  the  ADM  algorithms  are  much 
faster  in  terms  of  CPU  time,  especially  as  sparsity  level  increases.  For  noiseless  data,  we  observe  that  the 
ADM  algorithms  are  2  ~  3  orders  of  magnitude  faster  than  SPGL1.  From  Figure  5.1  and  Figure  5.3  it  is 
clear  to  see  that  the  speed  advantage  will  be  even  more  significant  for  higher  accuracy.  For  noisy  data,  the 
ADM  algorithms  gain  3  ~  8  times  speed  up  over  SPGL1. 
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6.  Conclusion.  We  have  proposed  efficient  alternating  direction  methods  for  group  sparse  optimization 
using  £2,i-regularization.  General  group  configurations  such  as  overlapping  groups  and  incomplete  cover  are 
allowed.  The  convergence  of  these  ADM  algorithms  is  guaranteed  by  the  existing  theory  if  one  minimizes  a 
convex  quadratic  function  exactly  at  each  iteration.  When  the  measurement  matrix  A  is  a  partial  transform 
matrix  that  has  orthonormal  rows,  the  main  computational  cost  is  only  two  matrix-vector  multiplications 
per  iteration.  In  addition,  such  a  matrix  A  can  be  treated  as  a  linear  operator  without  explicit  storage, 
which  is  particularly  desirable  for  large-scale  computation.  For  a  general  matrix  A ,  solving  a  linear  system  is 
additionally  needed.  Alternatively,  we  may  choose  to  minimize  the  quadratics  approximately,  e.g,  by  taking  a 
steepest  descent  step.  Empirical  evidence  has  led  us  to  believe  that  for  this  latter  case,  convergence  guarantee 
should  still  hold  under  certain  conditions  on  the  step  lengths  71,  72  (or  7).  Our  numerical  results  have 
demonstrated  the  effectiveness  of  the  ADM  algorithms  for  group  and  joint  sparse  solution  reconstructions. 
In  particular,  our  implementations  of  the  ADM  algorithms  exhibit  a  clear  and  significant  speed  advantage 
over  the  state-of-the-art  solver  SPGL1.  Moreover,  it  has  been  observed  that  at  least  on  random  problems 
ADM  algorithms  are  capable  of  achieving  a  higher  solution  quality  than  SPGL1  can  when  data  contains 
noise. 
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